Should We Have Blind Faith in Bioinformatics Software? Illustrations from the SNAP Web-Based Tool
نویسندگان
چکیده
Bioinformatics tools have gained popularity in biology but little is known about their validity. We aimed to assess the early contribution of 415 single nucleotide polymorphisms (SNPs) associated with eight cardio-metabolic traits at the genome-wide significance level in adults in the Family Atherosclerosis Monitoring In earLY Life (FAMILY) birth cohort. We used the popular web-based tool SNAP to assess the availability of the 415 SNPs in the Illumina Cardio-Metabochip genotyped in the FAMILY study participants. We then compared the SNAP output with the Cardio-Metabochip file provided by Illumina using chromosome and chromosomal positions of SNPs from NCBI Human Genome Browser (Genome Reference Consortium Human Build 37). With the HapMap 3 release 2 reference, 201 out of 415 SNPs were reported as missing in the Cardio-Metabochip by the SNAP output. However, the Cardio-Metabochip file revealed that 152 of these 201 SNPs were in fact present in the Cardio-Metabochip array (false negative rate of 36.6%). With the more recent 1000 Genomes Project release, we found a false-negative rate of 17.6% by comparing the outputs of SNAP and the Illumina product file. We did not find any 'false positive' SNPs (SNPs specified as available in the Cardio-Metabochip by SNAP, but not by the Cardio-Metabochip Illumina file). The Cohen's Kappa coefficient, which calculates the percentage of agreement between both methods, indicated that the validity of SNAP was fair to moderate depending on the reference used (the HapMap 3 or 1000 Genomes). In conclusion, we demonstrate that the SNAP outputs for the Cardio-Metabochip are invalid. This study illustrates the importance of systematically assessing the validity of bioinformatics tools in an independent manner. We propose a series of guidelines to improve practices in the fast-moving field of bioinformatics software implementation.
منابع مشابه
SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap
SUMMARY The interpretation of genome-wide association results is confounded by linkage disequilibrium between nearby alleles. We have developed a flexible bioinformatics query tool for single-nucleotide polymorphisms (SNPs) to identify and to annotate nearby SNPs in linkage disequilibrium (proxies) based on HapMap. By offering functionality to generate graphical plots for these data, the SNAP s...
متن کاملSNAP: Combine and Map modules for multilocus population genetic analysis
We have added two software tools to our Suite of Nucleotide Analysis Programs (SNAP) for working with DNA sequences sampled from populations. SNAP Map collapses DNA sequence data into unique haplotypes, extracts variable sites and manipulates output into multiple formats for input into existing software packages for evolutionary analyses. Map collapses DNA sequence data into unique haplotypes, ...
متن کاملMobyle SNAP Workbench: a web-based analysis portal for population genetics and evolutionary genomics
SUMMARY Previously we developed the stand-alone SNAP Workbench toolkit that integrated a wide array of bioinformatics tools for phylogenetic and population genetic analyses. We have now developed a web-based portal front-end, using the Mobyle portal framework, which executes all of the programs available in the stand-alone SNAP Workbench toolkit on a high-performance Linux cluster. Additionally...
متن کاملExpert Discovery: A web mining approach
Expert discovery is a quest in search of finding an answer to a question: “Who is the best expert of a specific subject in a particular domain within peculiar array of parameters?” Expert with domain knowledge in any field is crucial for consulting in industry, academia and scientific community. Aim of this study is to address the issues for expert-finding task in real-world community. Collabor...
متن کاملPrediction of Toxin-Antitoxin system (TA system) as a Novel Potent Target in Salmonella typhi Using Bioinformatics Analysis
Background and Objective: Salmonella typhi is one of the major challenges for the human and animal health. Salmonella with high pathogenicity can be harmful factor for human health. The control of this pathogen is a big challenge as it can cause serious infectious diseases such as gastroenteritis, septicemia and typhoid fever. On the other side, there are many factors such as toxin-antitoxin (T...
متن کامل